Comparing between Arabic Text Clustering using K Means and K Mediods
نویسنده
چکیده
In this study we have implemented the Kmeans and Kmediods algorithms in order to make a practical comparison between them. The system was tested using a manual set of clusters that consists from 242 predefined clustering documents. The results showed a good indication about using them especially for Kmediods. The average precision and recall for Kmeans compared with Kmediods are 0.56, 0.52, 0.69 and 0.60 respectively. we have also extract feature set of keywords in order to improve the performance, the result illustrates that two algorithms can be applied to Arabic text, a sufficient number of examples for each category, the selection of the feature space, the training data set used and the value of K can enormously affect the accuracy of clustering.
منابع مشابه
Comparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملComparing Model-based Versus K-means Clustering for the Planar Shapes
In some fields, there is an interest in distinguishing different geometrical objects from each other. A field of research that studies the objects from a statistical point of view, provided they are invariant under translation, rotation and scaling effects, is known as the statistical shape analysis. Having some objects that are registered using key points on the outline...
متن کاملD Partition Algorithms – A Study and Emergence of Mining Projected Clusters in High - Dimensional Dataset
High-dimensional data has a major challenge due to the inherent sparsity of the points. Existing clustering algorithms are inefficient to the required similarity measure is computed between data points in the full-dimensional space. In this work, a number of projected clustering algorithms have been analyzed. However, most of them encounter difficulties when clusters hide in subspaces with very...
متن کاملA Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کامل